Learning 5000 Relational Extractors
نویسندگان
چکیده
Many researchers are trying to use information extraction (IE) to create large-scale knowledge bases from natural language text on the Web. However, the primary approach (supervised learning of relation-specific extractors) requires manually-labeled training data for each relation and doesn’t scale to the thousands of relations encoded in Web text. This paper presents LUCHS, a self-supervised, relation-specific IE system which learns 5025 relations — more than an order of magnitude greater than any previous approach — with an average F1 score of 61%. Crucial to LUCHS’s performance is an automated system for dynamic lexicon learning, which allows it to learn accurately from heuristically-generated training data, which is often noisy and sparse.
منابع مشابه
Distant Supervised Relation Extraction with Wikipedia and Freebase
In this paper we discuss a new approach to extract relational data from unstructured text without the need of hand labeled data. Socalled distant supervision has the advantage that it scales large amounts of web data and therefore fulfills the requirement of current information extraction tasks. As opposed to supervised machine learning we train generic, relationand domain-independent extractor...
متن کاملData Mining on Symbolic Knowledge Extracted from the Web
Information extractors and classifiers operating on unrestricted, unstructured texts are an errorful source of large amounts of potentially useful information, especially when combined with a crawler which automatically augments the knowledge base from the world-wide web. At the same time, there is much structured information on the World Wide Web. Wrapping the web-sites which provide this kind...
متن کاملInteractive Learning of Relation Extractors with Weak Supervision
Interactive Learning of Relation Extractors with Weak Supervision
متن کاملMining : Foundations , Techniques and ApplicationsFinite - State Transducers for Semi - Structured Text
Text mining for semi-structured documents requires information extractors. Programming extractors by hand is diicult to catch up with the amount and the variation of the documents placed on the WorldWide Web everyday. This paper presents our recent result on applying machine learning techniques to au-tomatize the generation of the extractors. Our goal is to develop a domain and language indepen...
متن کاملLearning audio and image representations with bio-inspired trainable feature extractors
Recent advancements in pattern recognition and signal processing concern the automatic learning of data representations from labeled training samples. Typical approaches are based on deep learning and convolutional neural networks, which require large amount of labeled training samples. In this work, we propose novel feature extractors that can be used to learn the representation of single prot...
متن کامل